114 research outputs found

    SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications

    Full text link
    Summary: The Smith Waterman (SW) algorithm, which produces the optimal pairwise alignment between two sequences, is frequently used as a key component of fast heuristic read mapping and variation detection tools, but current implementations are either designed as monolithic protein database searching tools or are embedded into other tools. To facilitate easy integration of the fast Single Instruction Multiple Data (SIMD) SW algorithm into third party software, we wrote a C/C++ library, which extends Farrars Striped SW (SSW) to return alignment information in addition to the optimal SW score. Availability: SSW is available both as a C/C++ software library, as well as a stand alone alignment tool wrapping the librarys functionality at https://github.com/mengyao/Complete- Striped-Smith-Waterman-Library Contact: [email protected]: 3 pages, 2 figure

    Analysis of concordance of different haplotype block partitioning algorithms

    Get PDF
    BACKGROUND: Different classes of haplotype block algorithms exist and the ideal dataset to assess their performance would be to comprehensively re-sequence a large genomic region in a large population. Such data sets are expensive to collect. Alternatively, we performed coalescent simulations to generate haplotypes with a high marker density and compared block partitioning results from diversity based, LD based, and information theoretic algorithms under different values of SNP density and allele frequency. RESULTS: We simulated 1000 haplotypes using the standard coalescent for three world populations – European, African American, and East Asian – and applied three classes of block partitioning algorithms – diversity based, LD based, and information theoretic. We assessed algorithm differences in number, size, and coverage of blocks inferred under different conditions of SNP density, allele frequency, and sample size. Each algorithm inferred blocks differing in number, size, and coverage under different density and allele frequency conditions. Different partitions had few if any matching block boundaries. However they still overlapped and a high percentage of total chromosomal region was common to all methods. This percentage was generally higher with a higher density of SNPs and when rarer markers were included. CONCLUSION: A gold standard definition of a haplotype block is difficult to achieve, but collecting haplotypes covered with a high density of SNPs, partitioning them with a variety of block algorithms, and identifying regions common to all methods may be the best way to identify genomic regions that harbor SNP variants that cause disease

    Whole genome profiling of spontaneous and chemically induced mutations in Toxoplasma gondii

    Get PDF
    BACKGROUND: Next generation sequencing is helping to overcome limitations in organisms less accessible to classical or reverse genetic methods by facilitating whole genome mutational analysis studies. One traditionally intractable group, the Apicomplexa, contains several important pathogenic protozoan parasites, including the Plasmodium species that cause malaria. Here we apply whole genome analysis methods to the relatively accessible model apicomplexan, Toxoplasma gondii, to optimize forward genetic methods for chemical mutagenesis using N-ethyl-N-nitrosourea (ENU) and ethylmethane sulfonate (EMS) at varying dosages. RESULTS: By comparing three different lab-strains we show that spontaneously generated mutations reflect genome composition, without nucleotide bias. However, the single nucleotide variations (SNVs) are not distributed randomly over the genome; most of these mutations reside either in non-coding sequence or are silent with respect to protein coding. This is in contrast to the random genomic distribution of mutations induced by chemical mutagenesis. Additionally, we report a genome wide transition vs transversion ratio (ti/tv) of 0.91 for spontaneous mutations in Toxoplasma, with a slightly higher rate of 1.20 and 1.06 for variants induced by ENU and EMS respectively. We also show that in the Toxoplasma system, surprisingly, both ENU and EMS have a proclivity for inducing mutations at A/T base pairs (78.6% and 69.6%, respectively). CONCLUSIONS: The number of SNVs between related laboratory strains is relatively low and managed by purifying selection away from changes to amino acid sequence. From an experimental mutagenesis point of view, both ENU (24.7%) and EMS (29.1%) are more likely to generate variation within exons than would naturally accumulate over time in culture (19.1%), demonstrating the utility of these approaches for yielding proportionally greater changes to the amino acid sequence. These results will not only direct the methods of future chemical mutagenesis in Toxoplasma, but also aid in designing forward genetic approaches in less accessible pathogenic protozoa as well. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-354) contains supplementary material, which is available to authorized users

    Tangram: A comprehensive toolbox for mobile element insertion detection

    Get PDF
    © 2014 Wu et al.; licensee BioMed Central Ltd. Background: Mobile elements (MEs) constitute greater than 50% of the human genome as a result of repeated insertion events during human genome evolution. Although most of these elements are now fixed in the population, some MEs, including ALU, L1, SVA and HERV-K elements, are still actively duplicating. Mobile element insertions (MEIs) have been associated with human genetic disorders, including Crohn\u27s disease, hemophilia, and various types of cancer, motivating the need for accurate MEI detection methods. To comprehensively identify and accurately characterize these variants in whole genome next-generation sequencing (NGS) data, a computationally efficient detection and genotyping method is required. Current computational tools are unable to call MEI polymorphisms with sufficiently high sensitivity and specificity, or call individual genotypes with sufficiently high accuracy.Results: Here we report Tangram, a computationally efficient MEI detection program that integrates read-pair (RP) and split-read (SR) mapping signals to detect MEI events. By utilizing SR mapping in its primary detection module, a feature unique to this software, Tangram is able to pinpoint MEI breakpoints with single-nucleotide precision. To understand the role of MEI events in disease, it is essential to produce accurate individual genotypes in clinical samples. Tangram is able to determine sample genotypes with very high accuracy. Using simulations and experimental datasets, we demonstrate that Tangram has superior sensitivity, specificity, breakpoint resolution and genotyping accuracy, when compared to other, recently developed MEI detection methods.Conclusions: Tangram serves as the primary MEI detection tool in the 1000 Genomes Project, and is implemented as a highly portable, memory-efficient, easy-to-use C++ computer program, built under an open-source development model

    A standard variation file format for human genome sequences

    Get PDF
    Here we describe the Genome Variation Format (GVF) and the 10Gen dataset. GVF, an extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data. The 10Gen dataset, ten human genomes in GVF format, is freely available for community analysis from the Sequence Ontology website and from an Amazon elastic block storage (EBS) snapshot for use in Amazon's EC2 cloud computing environment

    Sequence analysis and characterization of active human alu subfamilies based on the 1000 genomes pilot project

    Get PDF
    © The Author(s) 2015. The goal of the 1000 Genomes Consortium is to characterize human genome structural variation (SV), including forms of copy number variations such as deletions, duplications, and insertions. Mobile element insertions, particularly Alu elements, are major contributors to genomic SV among humans. During the pilot phase of the project we experimentally validated 645 (611 intergenic and 34 exon targeted) polymorphic young Alu insertion events, absent fromthe human reference genome. Here, we report high resolution sequencing of 343 (322 unique) recent Alu insertion events, along with their respective target site duplications, precise genomic breakpoint coordinates, subfamily assignment, percent divergence, and estimated A-rich tail lengths.All the sequenced Alu lociwerederived from the Alu Y lineagewith no evidence of retrotransposition activity involving older Alu families (e.g., AluJandAluS). AluYa5 is currently themost active Alu subfamily in the human lineage, followed by AluYb8, andmany others including three newly identified subfamilieswe have termed AluYb7a3, AluYb8b1, and AluYa4a1. This report provides the structural details of 322 unique Alu variants from individual human genomes collectively adding about 100 kb of genomic variation. Many Alu subfamilies are currently active in human populations, including a surprising level of AluY retrotransposition. Human Alu subfamilies exhibit continuous evolution with potential drivers sprouting new Alu lineages

    PERANCANGAN VISUAL NOVEL ADAPTASI CERITA RAKYAT "MURTADO MACAN KEMAYORAN"

    Get PDF
    2016. Fatommy Ariadhi. Pengantar Karya Tugas Akhir ini berjudul “Perancangan Visual Novel Adaptasi Cerita Rakyat Murtado Macan Kemayoran”. Adapun permasalahan yang dikaji adalah : (1) Bagaimana merancang visual novel adaptasi cerita rakyat “Murtado Macan Kemayoran” yang menarik agar dikenal dan diminati oleh masyarakat Jakarta maupun luar kota Jakarta? (2) Bagaimanakah merancang media pendukung yang tepat, efektif, dan efisien untuk mengenalkan visual novel adaptasi cerita rakyat “Murtado Macan Kemayoran” kepada masyarakat Jakarta dan luar kota Jakarta? Tujuan dari perancangan ini adalah untuk memberikan sarana alternatif dalam mempelajari cerita rakyat, yaitu melalui permainan Visual Novel. Melalui media baru yang segar dan berbeda, diharapkan masyarakat Indonesia, khususnya anak – anak muda menjadi lebih tertarik dalam mempeajari cerita rakyat yang ada di Indonesia

    Order and the Virtual: Toward a Deleuzian Cosmology

    Get PDF
    None provided, have taken the following from the "Introduction" Order is a more or less explicit topic for any thinker who undertakes to write about nature. Even those who assert that randomness or chaos is the most fundamental trait of nature are obliged to account for the apparent permanence, organisation and structure we observe around us. No less is true for Gilles Deleuze, who champions the power of chaos through his work. On one reading, Deleuze’s chief impulse is to wrench loose the lynchpins of order; to ‘affirm chaos’ and disarticulate the law of excluded middle; to refuse jurisdiction to laws of nature and render provisional its every constant; to banish identity and negation alike. If we are to be left with no fixed point, we might ask, what remains of order? This study is nevertheless an examination of that notion in Deleuze’s natural philosophy. For me the counter-reading is much more productive and insightful. Deleuze is rather a firm believer in order, even there where he affirms chaos. If we could furnish a ‘Deleuzian Question’ par excellence, it would be; ‘Given that there are no fixed points, how is order expressed in the world?’ This question is implicitly reprised across the entirety of his work and inflected at each stage by fresh vocabulary coined to treat it anew, as though for each new Deleuzian territory a new phrasebook is required
    corecore